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Singular Value Decomposition (SVD) is a technique based on linear projection theory, which has 
been frequently used for data analysis. It constitutes an optimal (in the sense of least squares) 
decomposition of a matrix in the most relevant directions of the data variance. Usually this 
information is used to reduce the dimensionality of the data set in a few principal projection 
directions, this is called Truncated Singular Value Decomposition (TSVD). In situations where 
the data is continuously changing the projection might become obsolete. Since the change rate 
of data can be fast, it is an interesting question whether the TSVD projection of the initial 
data is reliable. In the case of complex networks, this scenario is particularly important when 
considering network growth. Here we study the reliability of the TSVD projection of growing 
scale free networks, monitoring its evolution at global and local scales. 
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1. Introduction 



There exists a vast literature that acknowledges Singular Value Decomposition as a valuable tool for 
information extraction from matrix-shaped dat a. This approach and its truncated variant have been ex- 



traordinarily successful in many applications [Golub fc Van Loan! 119961 ]. in particular for the analysis 
of relationships between a set of documents and the words they contain. In this case, the decomposi- 
tion yields information about word-word, word-docu ment and do c umen t-document semantic associations; 
the technique is known as latent semantic indexing Berry et all 19951 ] (LSI) or latent semantic analysis 



l 



2 Pau Erola et al. 



Landauer &; Dumaisl . 1997t | (LSA). In the field of comple x networks, we rece ntly introduced SVD as a 



useful tool to scrutinize the modular structure in networks [Arenas et al. . 2010( | 



Remarkably, a common characteristic of these applications is their dynamic nature. In order to attain 
successful information retrieval, for instance in a query, LSI or LSA must rely on the fact that SVD of textual 
resources is always up to date. Unfortunately, databases rarely stay the same. Addition and/or removal of 
information is constant, meaning that catalogs and indexes quickly become obsolete or incomplete. Turning 
to networks, the question is equally pertinent: both natural and artificial networks are dynamic, in the 
sense that they change through time (and so do their modular structures). Paradigmatic examples of this 
fact are the Internet, the World Wide Web or knowle dge databases like Wikipedia: all of them have been 
object of study from a g raph-theoretical point of view Pastor-Satorras &: Vespignanil . 20041 ; Capocci et al. 



20061 : IZlatic et aZ.I . |2Q06[ ] Given this realistic scenario, a major question arise, namely, for how long TSVD 



stands as a reliable projection of evolving data. 

In this paper we study the stability of TSVD as applied on changing networks. In particular, we want to 
quantify the differences between successive TSVD projections of evolving networks. To this end we devise 
a set of measures of global and local reliabil ity, and apply them to a c lassical model of network growth, the 
Barabasi- Albert's (BA) scale-free network Barabasi fc Albert , 19991 ]. The BA model consist in a random 



network whose formation is driven by: growth, the network starts with a small number of nodes, and a new 
one is added at each time step; and preferential attachment, the probability of a new node i linking to a 
previously existing node j is proportional to the current degree of node j. This mechanism yields networks 
with scale- free degree distributions P(k) = A;" 7 . 

This work is partially motivated by the app lication of TSVD to analyze the mesoscale of networks and 
its temporal evolution. In (Arenas et all \201(j\ . the object of analysis is the contribution matrix C, of N 



nodes to M modules. The rows of C correspond to nodes, and the columns to modules. The analysis of 
this matrix is the focus of our research. The elements Ci a are the number of links that node i dedicates 
to module a, and is obtained as the matrix multiplication between the network's adjacency matrix A and 
the partition matrix S: 



N 



Ci a — ^ ^ AijSj a , (1) 



where Sj a = 1 if node j belongs to module a, and Sj a = otherwise. Note that certain changes in the 
topology might not be reflected in the values of C, for example the rewiring of the connections of a node 
towards other nodes in the same community. 

To measure the reliability of the TSVD projection of the contribution matrix, here we will consider the 
"worst case scenario" where each node belongs to its own community. This case corresponds mathematically 
to C = A. Establishing that TSVD is robust to change in these circumstances will settle the fact that TSVD 
is robust to change on a coarse-grained structure. 



2. Analysis of networks based on TSVD 

Given a rectangular N x M (real or complex) matrix A, SVD stands for the factorization into the product 
of three other matrices, 

A = , (2) 

where U is an unitary A-by-A matrix (left singular vectors), and describes the original row entities as 
vectors of derived orthogonal factor values; S, the singular values, is a diagonal A-by-M matrix containing 
scaling values; and denotes the conjugate transpose of V, an M-by-M unitary matrix, which describes 
the original column entities in the same way as U. 

A practical use of SVD is dimensional reduction approximation via truncation, TSVD. It consists 
in keeping only some of the largest singular values to produce a least squares optimal, lower rank order 
approximation. Fo r example, severe dimensional reduction is a condition for succes s in machine learning 
SVD applications Deerwester et al. . 1990l : Berry et al . 1995 : Landauer et al\ . 19981 ], 
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In the case of a rank r = 2 approximation, the unicity of the two-r anked decomposition is e nsured 



if the ordered singular values Ui of the matrix E, satisfy o\ > o~i > o~ 3 [Golub fe Van Loanl . 119961 ]. This 
dimensional reduction is particularly interesting to depict results in a two-dimensional plot for visualization 
purposes. 

The idea we developed in our previous work Arenas et a/. . 201dl ] is to compute the projection of the 
connectivity of nodes (rows in A) into the space spanned by the first two left singular vectors, we call this the 
projection space U2 and w e denote the pro jected vector of the i-th node as ftj. Given that the transformation 
is information preserving [Chu fc Golubl . l2005l ]. the map obtained gives an accurate representation of the 
main characteristics of the original data, visualizable and, in principle, easier to scrutinize. It is important 
to highlight that this approach has essential differences with classical pattern recognition techniques based 
on TSVD such as Principal Components Analysis (PCA) or, equivalently, Karhunen-Loeve expansions. Our 
data (columns of A) can not be independently shifted to mean zero without loosing its original meaning, 
this restriction prevents the straightforward application of the mentioned techniques. 

To interpret correctly the outcome of the TSVD we change to polar coordinates, where for each node i 
the radius Ri measures the length of its contribution projection vector hi, and Oi the angle between hi and 
the horizontal axis. Large values of R correspond to highly connected nodes, and 6 reflects the adjacencies 
of each node in matrix A. Fig. Q] shows the R—0 planes of an evolving network to get a visual intuition 
of the map's stability: as the network grows the mapping is distorted. In the following section we develop 
measures to quantify the effect of the growth on the TSVD projection. 



3. Quantifying the reliability of TSVD on growing networks 

As stated in the introduction, the goal of this research is to test how TSVD projection, at rank r = 2, 
changes by computing it at different stages of the evolution of BA scale- free networks. This implies that 
TSVD will be computed on an initial network of size Nq, and then re-computed for successive node additions 
up to a final size Nt = 2Nq. To quantify the effect of growth on TSVD projection, we devise two levels of 
study: global and local. We will define measures based on the concept of absolute and relative distances 
between nodes, to this end we will work in the metric space 1A2- 



3.1. Global measure 

We propose a global quantity that indicates the amount of change in the position of nodes in the map 
obtained by TSVD. In the sequence of computed TSVD projections, the nodes' coordinates in U2 space 
change. This can be quantified by the difference of vectors hi between the initial and evolved network 
projection. 

In Fig. [2] we plot the projection of the growing network presented in Fig. Q] on the space IA2- We fix our 
attention in two time-shots of the evolution corresponding to growths of 30% and 80%. We compute the 
differences between positions of the same nodes at different stages (z) as Vi = n° — hf, producing a field 
map that accounts for the changes. This field map is shown in the insets of Figj2j When we have a 80% 
increase of the initial size, the vectors Uj are longer than in the 30% increment, which evidences a larger 
variability, i.e. a progressive degradation in the TSVD reliability. 

The global measure we propose to assess successive changes of rank r TSVD projection compared to 
the initial data is computed by the relative error. 

N r 

EE 103-081 

^global = Jj— , (3) 

E E \u%\ 
i=ij=i 

where U° represents the truncated left singular vectors of the original network with Nq nodes; and U z also 
represents the truncated left singular vectors, but of the grown network with size N z > Nq. 
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Fig. 1. Three snapshots of a growing network (left side) , their corresponding projection on R-6 plane (center) and the 
^-overlapping matrices (right side; see [Arenas et all |2010( | for details). For the sake of clarity, the initial set of nodes (a) 
N — 1000 are drawn in black; the second snapshot (b) represents a growth of a 30% of nodes, N = 1300, new nodes are drawn 
in red. Finally (c) represents a network with N = 1800, last arrived nodes are depicted in green. Some nodes from the initial 
set have been highlighted (2, 4, 5, 6, 7, 964) in the R—9 plane, to get a visual intuition of the map's stability. Note that nodes 
with a high value of R (2, 4, 5, 6, 7) remain almost unchanged throughout the topology's growth; whereas node 964 undergoes 
much change from an absolute point of view. The rightmost matrices illustrate the amount of change of nodes with respect to 
their 6 angles: as nodes are added in the structure the cosine overlaps between them increasingly distorts the original figure. 



We have applied this global measure to monitor the evolution of the TSVD stability for growing 
networks with initial sizes Nq = 1000 and Nq = 10000. Fig. [3] shows the percentage of relative error with 
respect to the original network. In the chart, each successive point represents a 5% of nodes addition. Up 
to a 40% growth the global error remains below 10%, and doubling the network size the average error still 
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Fig. 2. Projection in the IA2 space of the evolving network presented in Fig. [T] The insets for N = 1300 and N = 1800 trace 
the vectors between the projected coordinates of each node on the grown network and the original coordinates on the initial 
network with Nq = 1000. We use these vectors Sj to quantify the variability of the TSVD. Nodes are colored like in Fig. [1] 

remains below 20%. These results show the reliability of the projection after the growing process. 
3.2. Local measure 

Though informative, the previous global quantity can overlook changes at the microscopic level. The 
neighborhood of each node in the U2 plane could undergo changes in the sequence of computed TSVD 
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Fig. 3. Global error on two growing networks with initial sizes Nq = 1000 (above) and iVo = 10000 (below). For each network 
we compute the error by increments of 5% of growth. In both cases, the global error is lower than 10% up to the 40% increment 
of network size. Each point is the average of 100 simulations. 

projections difficult to be revealed by the global measure defined above. Thus, we propose a measure that 
reflects these local changes using the distances between nodes in a neighborhood. Instead of defining a 
sharp border for the neighbors of each node, we propose to use a gaussian neighborhood that weights the 
distances according to a variance a. 

First, we construct the N x N matrix of distances between any pair of nodes in the network at stage 

z as 



£ 

k=l 



ik 



JJ Z 



(4) 



where U z represents the truncated left singular vectors of the network. These distances reflect a measure 
of proximity between nodes, independently on the global positioning in the map. The neighborhood is 
weighted to prioritize the stability on closer nodes over the distant ones. To this respect, we compute 
a matrix of weighted distances S* using a gaussian distribution that establishes a radius of influence as 
follows: 



S 



ij 



(5) 



where we have chosen a radius of influence depending on the node, R® is the module of the projected vector 
hi in the initial network, and a is a constant. This radius of influence proporti onal to the distance to the 
origin, emphasizes nodes with larger R which are the most connected ones, see [Arenas et al! 1201(11 ] . Using 
different values of a in the gaussian function we can tune the size of the neighborhood. Fig. [4] shows, for 
a network with 1000 nodes, three magnified views of a network projection in Ui to illustrate the gaussian 
radius of influence. 

Finally, the local measure of reliability we propose is computed as the relative error. 



E 



N N 

E E \sf. 

i=U=l ' 



c0 I 



local 



No N 

E E K 

i=l,7=l 



(6) 



ij 



where S° and S z represent the matrices of weighted distances of the original network with Nq nodes, and 
the grown network with size N z > Nq, respectively. 
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Fig. 4. From top to bottom, we present the radius of influence of a — 0.1 (red), a = 1 (green) and a = 10 (blue). In the 
bottom of each chart, we have plotted, for the nodes highlighted in yellow, the gaussian curves that we added to matrix D z 
to compute the matrix of weighted distances S z . 



Fig. [5] shows the local error measured on two growing networks by increments of 5% of growth. Their 
initial size is iV"o = 1000 (left) and iVo = 10000 (right). For each network we compute the relative error for 
a = 0.1, 1 and 10. When a = 0.1 only the closest neighbors have a significant weight in the measurement 
of the local error. These low values of a give the neighbor-wise error a very local sense. On the other 
hand, when a = 10 the gaussian curve becomes flat and the measure is affected by the entire network 
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Fig. 5. Local error on two growing networks with initial size iVo = 1000 (left) and Nq = 10000 (right). For each network we 
compute the relative error for a = 0.1, 1 and 10 by increments of 5% of growth. For small values of a the error is lower, but 
in all settings, the reliability of the projection is high. Each point is calculated with 100 simulations. 



perturbations, i.e. every node is equally considered as belonging to the neighborhood. Despite this global 
neighborhood for high a values, the local error measure represents a relative distance to each node, and as 
we see, doubling the network size the average error remains below 0.1%. These very low error rates ensure 
a good reliability of the projection from a local point of view. 

4. Conclusions 

In this article we have raised the question about reliability of a standard linear projection technique such 
as SVD. The question is pertinent because SVD, and in particular its truncated version (TSVD), is rooted 
at the heart of some methodologies which pretend to extract useful and reliable information from dynamic 
data, i.e. data that is constantly undergoing change. We focus on growing scale-free networks. 

We tackle the problem from two complementary points of view. At the large-scale level, we monitor 
average changes in nodes' TSVD projections. This means that each node's projection is compared against 
itself on successive changes. 

Note however that success in practical applications of TSVD depends mostly on neighborhood sta- 
bility. In other words, coherence of the output when data has suffered changes relies on the fact that the 
surroundings of a projected node are similar to those before those changes had happened. From a mathe- 
matical point of view, this merely implies that projections change in a coordinated way, such that relative 
positions are stable. Keeping this in mind, the local measure developed above captures this facet of the 
problem by comparing not the evolution of a nodes position against itself, but rather against the rest of 
nodes. Furthermore, we introduce a parameter to weight this variation depending on the distance from the 
node of interest. This tunable parameter allows for a finer observation of neighborhood stability, ranging 
from immediate neighborhood measures to far-reaching areas. Note that the local measure is orders of 
magnitude lower than the global one. This points to the fact that, although the projection changes signif- 
icantly, displacements in the plane U2 are similar in magnitude and direction on average. In other words, 
as a node of the network grows following the preferential attachment, it is highly likely that its neighbors 
also increase their weight staying close together. 

Results indicate that TSVD projections are very robust against data growth. From a global point 
of view, an addition of 40% of new data implies only an average change of 10% from initial conditions. 
Doubling the amount of nodes to a network supposes a modification of 15% in the positions of the set of 
initial nodes. More importantly, changes at the local level (neighborhood) are close to even in the most 
demanding case. 

Such results have been obtained with rather large structures (iVo = 10 3 and Nq = 10 4 ), which at 
the end of the process have doubled their initial size. This ensures that TSVD is reliable in a wide range 
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of situations. On the other hand, our study focuses on a particular network model (BA) in which time 
plays an important role: the later a node appears, the lowest its chances to become an important one (a 
hub). We anticipate that the irruption of important entities at late stages of evolution would surely disrupt 
TSVD projections in a more significant way. Nonetheless, we stress that growing systems typically develop 
smoothly, so our conclusions can be safely held. 

Finally, we can briefly relate these results to the original motivation of the manuscript, that is, a 
scenario where the modular structure of networks is taken into account. In that situation, the stability 
of a TSDV map in the case of network changes is granted given the above reported results. Then, the 
characterization of the role of nodes and modules in terms of SVD's output can be safely regarded as 
faithful even in the case of severe changes in the underlying topology. 
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